Predicting Medical School Admissions
Predicting Medical School Admissions
1 Background
In 2016-2017, 53,042 people applied to medical school, while approximately 40% of those applicants matriculated. As the number of applicants continues to increase year after year, so too does the competitive nature of medical school admissions. While many factors play a role in medical school admissions, it remains relatively opaque how objective metrics such as undergraduate grade point average (GPA), graduate school performance, and MCAT performance, etc. weigh in to the medical school admission process.
The typical medical school applicant will apply to around 15.6 schools, spending an average of $1900 in application fees alone App Cost. With only around 40% of applicants gaining admission to at least one medical school MCAT/GPA Grid, the admissions process is quite competitive. It would be great if potential applicants had a method for gauging the competitiveness of their application, prior to spending hundreds of dollars.
Undergraduate GPA and MCAT score have been identified by admissions committees as the most important factors in the pre-interview process in a 2014 Kaplan Survey of 78 medical schools.
2 Research Questions
Does the percentile on the Medical College Admission Test predict a successful applicant cycle?
Does the average in the core physiology classes predict a successful applicant cycle?
3 The Data
The data come from five classes of students in the medical physiology program, entering in the years 2011 - 2015. The complete, tidied dataset is a combination of applicant data from the Hobson applicant portal, semester grades collected on blackboard, an exit survey, and miscellaneous spreadsheets collected by interested advisors. Demographic data, undergraduate GPAs, and test scores were obtained through Hobson. The total physiology GPA was calculated from semester grades. Acceptance information was obtained through the exit survey as well as the advisor sheets.
A few drawbacks of this dataset include:
- Only a small subset of people completed the exit survey (~100/600).
- I suspect that the advisor sheets are not up to date.
- I have no way to quantify success at the interview stage.
- How many interviews did the student attend?
- I would have liked to know how many programs students applied to.
- I would have liked a count of hours spent shadowing/volunteering.
- Only the first MCAT score reported was used. (Some medical schools average scores, take the highest, or use the last score.)
- MCAT scores were reported on two scaling systems: 2015 (out of 528) 2015 Percentiles and before 2015 (out of 45) Old Percentiles. Percentiles were calculated (
pnorm()) based on the published standard deviations and means. Normal distributions were assumed, however I noticed some of the percentiles towards the extremes were off by about 2 points. - There was a fair amount (~30%) of MCAT score data missing. (We address this issue through a complete case analysis.)
- Where is the 2015 outcome data?
- Combining and cleaning the datasets took a very long time as neither student IDs/numbers were used on the grade or advisor sheets.
3.1 Data Load
# Survey Responses
outcomes <- read_excel("Exit_Survey_Raw.xlsx", sheet = "Sheet_1") %>%
filter(is.na(Last.name) != TRUE) #get rid of no names
# Grades/NBME/Demographics Entering 2011 Class
class2011_nbme <- read_excel("Grades 2011.xlsx", sheet = "Summary of Performance") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
# Grades/NBME/Demographics Entering 2012 Class
class2012_dems <- read_excel("Grades 2012.xlsx", sheet = "Student Summary") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2012_nbme <- read_excel("Grades 2012.xlsx", sheet = "Summary of Performance") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
# Grades/NBME/Demographics Entering 2013 Class
class2013_dems <- read_excel("Grades 2013.xlsx", sheet = "MS Roster") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2013_nbme <- read_excel("Grades 2013.xlsx", sheet = "Summary") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
# Grades/NBME/Demographics Entering 2014 Class
class2014_dems <- read_excel("Grades 2014.xlsx", sheet = "Summary") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2014_nbme <- read_excel("Grades 2014.xlsx", sheet = "Sheet1") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2014_data <- read_excel("Grades 2014.xlsx", sheet = "More_Data_2014") %>%
select(app.ID, Race, Sex)
#filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
# Grades/NBME/Demographics Entering 2015 Class
class2015_dems <- read_excel("Grades 2015.xlsx", sheet = "Fall2015") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2015_nbme <- read_excel("Grades 2015.xlsx", sheet = "Summary1") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
# Grades/Demographics Entering 2016 Class
class2016_dems <- read_excel("Grades 2016.xlsx", sheet = "Sheet2") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2016_dems2 <- read_excel("Google Grades 2016.xlsx", sheet = "Sheet1") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2016_phol481 <- read_excel("Google Grades 2016.xlsx", sheet = "Phol481") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rows
class2016_phol483 <- read_excel("Google Grades 2016.xlsx", sheet = "Phol483") %>%
filter(is.na(Last.name) != TRUE) # Remove trailing empty rowsAs orginally loaded, there are 11 tibbles from 11 separate sheets in 7 excel files. There was at least one tibble for each year of data.
3.2 Tidying, Data Cleaning and Data Management
# Merge Demographic and NBME Tibbles for Each Class year
# Link by Last and First names
class2011 <- class2011_nbme %>%
mutate(Year = 2011) # Keep track of the Entry Year
class2012 <- full_join(class2012_dems, class2012_nbme,
by = c("Last.name", "First.name")) %>%
mutate(Year = 2012) # Keep track of the Entry Year
class2013 <- full_join(class2013_dems, class2013_nbme,
by = c("Last.name", "First.name")) %>%
mutate(Year = 2013) # Keep track of the Entry Year
class2014 <- full_join(class2014_dems, class2014_nbme,
by = c("Last.name", "First.name")) %>%
filter(is.na(app.ID) != TRUE) # Remove people with inadequate info
class2014 <- full_join(class2014, class2014_data,
by = c("app.ID")) %>%
mutate(Year = 2014) # Keep track of the Entry Year
class2015 <- full_join(class2015_dems, class2015_nbme,
by = c("Last.name", "First.name")) %>%
mutate(Year = 2015) # Keep track of the Entry Year
class2016 <- full_join(class2016_dems, class2016_dems2,
by = c("Last.name", "First.name")) %>%
full_join(class2016_phol481, by = c("Last.name", "First.name")) %>%
full_join(class2016_phol483, by = c("Last.name", "First.name")) %>%
mutate(Year = 2016) %>% # Keep track of the Entry Year
filter(is.na(app.ID) != TRUE) # Remove people with inadequate info
# Bind all the Years' Data Together
demographics <- bind_rows(class2011, class2012, class2013,
class2014, class2015, class2016)
# Join the outcomes
dem.outcomes <- full_join(demographics, outcomes,
by = ignore.case(c("Last.name", "First.name")))Combine the information for each year’s data, linking by first and last names.
dems <- dem.outcomes %>%
mutate(record.ID = seq(1, nrow(dem.outcomes)), # Number the people
# Fill in Values for NAs
Accepted = ifelse((is.na(Accepted) == TRUE), 0, Accepted),
Accepted.Survey = ifelse(is.na(Accepted.Survey) == T, 0, Accepted.Survey),
# Use the Survey MCAT Values to Update Original Score
MCAT.Survey.Last = ifelse(is.na(MCAT.Survey.Score2) == T,
MCAT.Survey.Score1, MCAT.Survey.Score2),
MCAT.T = ifelse(is.na(MCAT.Survey.Last) == T, MCAT.T, MCAT.Survey.Last),
# Use the Survey to Update Accepted Schools
School.Acceptances = ifelse(is.na(School.Acceptances.Survey) == T,
School.Acceptances, School.Acceptances.Survey),
# Some of the values were on 4.00 scale
PHOL483 = ifelse(PHOL483 <= 4, (PHOL483 + 1)*20, PHOL483),
PHOL484 = ifelse(PHOL484 <= 4, (PHOL484 + 1)*20, PHOL484),
# Calculate Weighted 100 Pt Scale
Phys.GPA = ifelse(is.na(Phys.GPA) == TRUE,
(PHOL481*(6/16) + PHOL483*(6/16) +
PHOL482*(2/16) + PHOL484*(2/16)),
Phys.GPA),
# Convert 100 to 4.00 Scale
Grad.GPA = ifelse(is.na(Grad.GPA) == TRUE,
Phys.GPA/20 - 1, Grad.GPA),
Ugrad.GPA = round(Ugrad.GPA, 2),
# Calculate MCAT Percentiles based on AAMC Data/Normal Distribution
MCAT.Percentile = round(ifelse(MCAT.T > 470,
pnorm(MCAT.T, mean = 499.6, sd = 10.4, lower.tail = T)*100,
pnorm(MCAT.T, mean = 25.2, sd = 6.4, lower.tail = T)*100),0),
# Create Factor Variables
Accepted.F = factor(Accepted, levels = c(0, 1),
labels = c("No", "Yes")),
Sex.F = factor(Sex, levels = c(0, 1),
labels = c("F", "M")),
Race.F = factor(Race, levels = c(0, 1, 2, 3, 4),
labels = c("White", "Asian",
"Hispanic", "Black", "Other")),
Race.2 = ifelse(Race <= 1, 0, 1),
Race.F2 = factor(Race.2, levels = c(0, 1),
labels = c("ORM", "URM")),
# Combine some variables
Accepted = Accepted + Accepted.Survey, #Update with survey info
Accepted = ifelse(Accepted >= 1, 1, 0), #Update with survey info
Answered.Survey = ifelse(is.na(Answered.Survey) == TRUE, 0, Answered.Survey),
# Interested in Med School?
Program.Choice = ifelse(is.na(MD.DDS)==T,
Program.Choice, MD.DDS),
Program.Choice = ifelse(is.na(Program.Choice)==T, 1, Program.Choice))
#write.csv(dems, "Full_MSMP_432Project.csv") ## this is for Ben
# Select only the Variables needed
dems.subset <- dems %>%
select(record.ID, Phys.GPA, NBME.Percentile, Ugrad.GPA,
MCAT.Percentile, Accepted, Accepted.F, Year, Race, Race.F, Race.2, Race.F2,
Sex, Sex.F, Program.Choice)Create a few new variables, including a record ID. Use information from the exit survey to update the advisor sheets regarding acceptances and MCAT scores. Calculate MCAT percentiles from the orginal scores. Collapse the race variable into two categories, underrepresented minorities and not. Select only the students interested in medical school.
# Only the complete cases
Leo <- dems %>%
filter(Program.Choice == 1 & Year < 2015) %>% # Select only the people interested in Med School
select(record.ID, Phys.GPA, NBME.Percentile, Ugrad.GPA,
MCAT.Percentile, Accepted, Accepted.F, Year, Race.2, Race.F2,
Sex, Sex.F) %>%
na.omit() # Select for only the Complete Cases
#write.csv(Leo, "dID_MSMPdata_Complete.csv")# Data from years 2011 - 2014 with missing data
# Including the complete cases (Leo)
Trish <- dems %>%
filter(Program.Choice == 1 & Year < 2015) %>%
# Remove people with missing data from all Courses/NBME
filter(is.na(PHOL481 & PHOL482 & PHOL483 & PHOL484 & NBME.Percentile)!=TRUE) %>%
# Select Variables
select(record.ID, PHOL481, PHOL482, PHOL483, PHOL484,
Phys.GPA, NBME.Percentile, Ugrad.GPA,
MCAT.Percentile, Accepted, Year, Race.2, Race.F2, Sex,
Sex.F)3.3 Tidied Tibbles
There were two tibbles used in this project:
- The complete case with 148 rows and 12 columns (variables)
glimpse(Leo)Observations: 148
Variables: 12
$ record.ID <int> 46, 48, 50, 51, 52, 53, 54, 57, 60, 61, 62, 64...
$ Phys.GPA <dbl> 86.43946, 91.43783, 86.17240, 83.81837, 80.469...
$ NBME.Percentile <dbl> 53, 73, 9, 39, 22, 56, 76, 7, 70, 2, 53, 49, 9...
$ Ugrad.GPA <dbl> 3.31, 3.29, 3.14, 3.10, 2.89, 3.70, 3.33, 3.03...
$ MCAT.Percentile <dbl> 94, 86, 31, 82, 43, 43, 43, 77, 43, 77, 61, 49...
$ Accepted <dbl> 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ Accepted.F <fctr> Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Ye...
$ Year <dbl> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012...
$ Race.2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
$ Race.F2 <fctr> ORM, ORM, ORM, ORM, ORM, ORM, ORM, ORM, ORM, ...
$ Sex <dbl> 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1...
$ Sex.F <fctr> M, F, F, F, F, M, M, M, M, M, F, F, M, F, F, ...
- The imputed case with 286 rows and 15 columns (variables)
glimpse(Trish)Observations: 286
Variables: 15
$ record.ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15,...
$ PHOL481 <dbl> 90.38848, 90.92727, 97.74667, 91.70970, 89.303...
$ PHOL482 <dbl> 93.36, 87.80, 95.15, 87.94, 86.07, 89.50, 98.0...
$ PHOL483 <dbl> 92.12000, 86.10667, 92.74667, 89.93333, 87.546...
$ PHOL484 <dbl> 90.32806, 85.00459, 91.58776, 89.81071, 88.084...
$ Phys.GPA <dbl> 91.40169, 87.98830, 94.77722, 90.33498, 88.087...
$ NBME.Percentile <dbl> 94, 94, 91, 91, 84, 84, 81, 76, 76, 73, 73, 70...
$ Ugrad.GPA <dbl> 2.81, 3.04, 3.41, 3.51, 3.38, 3.80, 3.89, 3.26...
$ MCAT.Percentile <dbl> 89, 77, 92, 86, 94, 77, 82, 43, 67, 77, 82, 77...
$ Accepted <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1...
$ Year <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011...
$ Race.2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ Race.F2 <fctr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
$ Sex <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ Sex.F <fctr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
3.4 Code Book
| Variable | Type | Notes |
|---|---|---|
| record.ID | Record ID | Linking number for each student |
| PHOL481 | Quant | Med Phys 1 Semester Grade |
| PHOL482 | Quant | Trans Phys 1 Semester Grade |
| PHOL483 | Quant | Med Phys 2 Semester Grade |
| PHOL484 | Quant | Trans Phys 2 Semester Grade |
| Phys.GPA | Quant | Physiology GPA (100 point scale) |
| NBME.Percentile | Quant | NBME Percentile (100 point scale) |
| Ugrad.GPA | Quant | Undergraduate GPA (4.00 scale) |
| MCAT.Percentile | Quant | MCAT Percentile (100 point scale) |
| Accepted | Binary Outcome | Accepted to Medical School? (Yes/No) |
| Year | Categorical | Year entering program |
| Race.F2 | Factor | Race (White or non-white) |
| Sex.F | Factor | Sex (Male or Female) |
| Program.Choice | Binary | Program Choice (Indcator Var) |
3.5 Table 1 Complete Cases
# Make a Table 1 for the Complete Cases
vars <- c("Phys.GPA", "NBME.Percentile",
"Ugrad.GPA", "MCAT.Percentile",
"Race.F2", "Sex.F", "Accepted")
fVars <- c("Race.F2", "Sex.F", "Accepted")
CreateTableOne(vars = vars, factorVars = fVars, strata = "Year", data = Leo) Stratified by Year
2012 2013 2014
n 55 20 73
Phys.GPA (mean (sd)) 87.26 (5.01) 88.95 (7.09) 87.36 (7.23)
NBME.Percentile (mean (sd)) 46.44 (28.75) 55.15 (28.57) 46.12 (28.28)
Ugrad.GPA (mean (sd)) 3.26 (0.29) 3.13 (0.37) 3.20 (0.30)
MCAT.Percentile (mean (sd)) 61.55 (21.88) 67.25 (19.34) 59.84 (23.89)
Race.F2 = URM (%) 4 ( 7.3) 6 (30.0) 13 (17.8)
Sex.F = M (%) 27 (49.1) 9 (45.0) 40 (54.8)
Accepted = 1 (%) 45 (81.8) 13 (65.0) 34 (46.6)
Stratified by Year
p test
n
Phys.GPA (mean (sd)) 0.576
NBME.Percentile (mean (sd)) 0.432
Ugrad.GPA (mean (sd)) 0.266
MCAT.Percentile (mean (sd)) 0.432
Race.F2 = URM (%) 0.042
Sex.F = M (%) 0.676
Accepted = 1 (%) <0.001
# Make a Correlation Matrix with Complete Cases
pairs(~ Accepted + Phys.GPA + Ugrad.GPA + NBME.Percentile +
MCAT.Percentile + Race.F2 + Sex.F,
data = Leo,
main = "Correlation Matrix - Complete Cases",
upper.panel = panel.smooth,
diag.panel = panel.hist,
lower.panel = panel.cor)3.6 Table 1 Imputed Cases
# Make a Table 1 for the Complete Cases
vars <- c("Phys.GPA", "NBME.Percentile",
"Ugrad.GPA", "MCAT.Percentile",
"Race.F2", "Sex.F", "Accepted")
fVars <- c("Race.F2", "Sex.F", "Accepted")
CreateTableOne(vars = vars, factorVars = fVars, strata = "Year", data = Trish) Stratified by Year
2011 2012 2013
n 42 72 70
Phys.GPA (mean (sd)) 84.53 (5.27) 85.80 (5.65) 85.92 (6.72)
NBME.Percentile (mean (sd)) 37.64 (32.32) 41.32 (30.56) 44.26 (29.24)
Ugrad.GPA (mean (sd)) 3.23 (0.28) 3.24 (0.30) 3.13 (0.29)
MCAT.Percentile (mean (sd)) 61.26 (24.19) 61.19 (21.54) 69.91 (19.31)
Race.F2 = URM (%) 0 ( NaN) 10 (14.9) 24 (36.9)
Sex.F = M (%) 0 ( NaN) 32 (45.1) 33 (50.8)
Accepted = 1 (%) 35 (83.3) 53 (73.6) 35 (50.0)
Stratified by Year
2014 p test
n 102
Phys.GPA (mean (sd)) 85.85 (7.76) 0.701
NBME.Percentile (mean (sd)) 42.72 (29.07) 0.711
Ugrad.GPA (mean (sd)) 3.20 (0.31) 0.222
MCAT.Percentile (mean (sd)) 60.79 (23.77) 0.374
Race.F2 = URM (%) 17 (18.3) NaN
Sex.F = M (%) 55 (53.9) NaN
Accepted = 1 (%) 43 (42.2) <0.001
The NaN occurs because demographic data was completely absent from 2011.
4 Analyses
Create a logistic regression model to predict the probability of at least one medical school acceptance based on the following variables:
Phys.GPA: Student’s overall Physiology Grades (100 pt scale) from core classesNBME.Percentile: Percentile score on the NBME Physiology Subject ExamUgrad.GPA: Undergraduate GPAMCAT.Percentile: Percentile on the Medical College Admission TestRace.F2: Student’s race - ORM (white/Asian), URM (Hispanic/African American/Other)Sex.F: Sex - Male or Female
4.1 Complete: KS
# Use all the predictors
dd <- datadist(Leo)
options(datadist = "dd")
lrm.model.1 <- lrm(Accepted ~ Phys.GPA + Ugrad.GPA + NBME.Percentile +
MCAT.Percentile + Race.F2 + Sex.F, data = Leo,
x = TRUE, y = TRUE)
lrm.model.1Logistic Regression Model
lrm(formula = Accepted ~ Phys.GPA + Ugrad.GPA + NBME.Percentile +
MCAT.Percentile + Race.F2 + Sex.F, data = Leo, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 148 LR chi2 7.22 R2 0.065 C 0.633
0 56 d.f. 6 g 0.534 Dxy 0.266
1 92 Pr(> chi2) 0.3011 gr 1.707 gamma 0.267
max |deriv| 3e-12 gp 0.121 tau-a 0.126
Brier 0.224
Coef S.E. Wald Z Pr(>|Z|)
Intercept -6.7208 3.6941 -1.82 0.0689
Phys.GPA 0.0907 0.0404 2.24 0.0249
Ugrad.GPA -0.0212 0.5904 -0.04 0.9714
NBME.Percentile -0.0230 0.0099 -2.32 0.0205
MCAT.Percentile 0.0088 0.0086 1.02 0.3065
Race.F2=URM -0.1885 0.4996 -0.38 0.7060
Sex.F=M -0.0915 0.3578 -0.26 0.7982
Include all of the variables. The C statistic is low, at 0.633. The \(R^2\) is also very low, 0.065. Of note, only Phys.GPA and NBME.Percentile are significant. This is unexpected, as I thought the NBME was a useless exam in terms of medical school admissions (otherwise they would ask for it). I would have expected MCAT, Race, and undergraduate GPA to be important predictors.
# plot the anova values
plot(anova(lrm.model.1))Again, NBME.Percentile and Phys.GPA are important here.
plot(summary(lrm.model.1))lrm.model.1.coef <- tidy(summary(lrm.model.1))
# Extract Even Rows and Confidence Intervals
lrm.model.1.tidy <- lrm.model.1.coef[c(FALSE, TRUE),
c("Effect", "Lower.0.95", "Upper.0.95")]
# set rownames
invisible((setattr(lrm.model.1.tidy, "row.names",
c("Phys.GPA", "Ugrad.GPA", "NBME.Percentile", "MCAT.Percentile",
"Race.R2", "Sex.F - F:M"))))
pander(lrm.model.1.tidy, caption = "Odds Ratios: KS Complete Case")| Â | Effect | Lower.0.95 | Upper.0.95 |
|---|---|---|---|
| Phys.GPA | 2.003 | 1.092 | 3.676 |
| Ugrad.GPA | 0.9917 | 0.6279 | 1.566 |
| NBME.Percentile | 0.3409 | 0.1372 | 0.8471 |
| MCAT.Percentile | 1.381 | 0.744 | 2.562 |
| Race.R2 | 0.8282 | 0.3111 | 2.205 |
| Sex.F - F:M | 1.096 | 0.5435 | 2.21 |
Summary of the effects of each predictor, showing the effect on Accepted when moving from the 25th to the 75th percentile of each variable while holding the other variables at a prespecified level. As already seen in the ANOVA analysis,Phys.GPA and NBME.Percentile have the largest effect on Accepted. Increases in Phys.GPA doubles the likelihood of acceptance (OR: 2.003; 95% CI: 1.092, 3.676). Likewise, increases in NBME.Percentile decrease the likelihood of acceptance by about 66% (OR: 0.3409; 95% CI: 0.1372, 0.8471). Ugrad.GPA, MCAT.Percentile, Race.R2, and Sex all had effect sizes that crossed 1, reflecting what the ANOVA plot indicated in that Phys.GPA and NBME.Percentile are the most important terms in this model.
4.1.1 Calibration
n=148 Mean absolute error=0.028 Mean squared error=0.00133
0.9 Quantile of absolute error=0.067
This calibration plot is quite terrible. This model underpredicts for probabilities under 0.6 and over predicts at probabilities above 0.7.
4.1.2 Nomogram
plot(nomogram(lrm.model.1, fun=plogis, funlabel = "Probability of Acceptance"))The nomogram highlights some important features. First, it is surprising how little of an impact undergraduate GPA has on the probability of acceptance; MCAT performance weighs more heavily, but is still not great at raising the probability of acceptance. Performance in the core physiology curriculum, however, does seem to have a large influence on the eventual probability. This was an expected finding, one which we thought we would see go hand-in-hand with undergraduate GPA and MCAT percentile. ‘Sex’ was right where we expected, having little overall effect, with females slightly favored. ‘Race’ was also a surprise: we expected race to have more weight in acceptance probability, much less the virtual 0 influence it has as predicted by this kitchen sink model.
Overall, we are surprised to see how much Medical Physiology performance influences probability of acceptance as compared to other factors, namely undergraduate GPA and MCAT percentile.
4.1.3 Plot Predictions: Kitchen Sink
plot.ks.gpa <- ggplot(Predict(lrm.model.1, Phys.GPA = 60:105, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = Phys.GPA, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "Physiology Average",
y = "Pr(Acceptance)",
#title = "Model 1 - Kitchen Sink Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
plot.ks.nbme <- ggplot(Predict(lrm.model.1, NBME.Percentile = 0:100, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = NBME.Percentile, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "NBME Percentile",
y = "Pr(Acceptance)",
title = "Model 1 - Kitchen Sink Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
plot.ks.mcat <- ggplot(Predict(lrm.model.1, MCAT.Percentile = 0:100, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = MCAT.Percentile, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "MCAT Percentile",
y = "Pr(Acceptance)",
#title = "Model 1 - Kitchen Sink Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
grid.arrange(plot.ks.nbme, plot.ks.gpa, plot.ks.mcat, nrow = 3)Holding Medical Physiology grade at a median value of 87.81, undergraduate GPA at 3.22, and MCAT percentile at 66.5, probability of acceptance declines with NBME performance. Again, this is unexpected. No significant differences between Sex or Race.
Holding undergraduate GPA at 3.22, NBME percentile at 47, and MCAT percentile at 66.5, probability of acceptance increases with Medical Physiology grade. Again, significant differences between Sex or Race.
Holding Medical Physiology grade at 87.81, undergraduate GPA at 3.22, and NBME Percentile at 47, probability of acceptance increased with MCAT percentile, although our model has suggested that MCAT percentile was not significant.
4.1.4 Validate
set.seed(314); validate(lrm.model.1) index.orig training test optimism index.corrected n
Dxy 0.2651 0.3476 0.1908 0.1568 0.1083 40
R2 0.0648 0.1158 0.0358 0.0800 -0.0152 40
Intercept 0.0000 0.0000 0.2260 -0.2260 0.2260 40
Slope 1.0000 1.0000 0.5284 0.4716 0.5284 40
Emax 0.0000 0.0000 0.1846 0.1846 0.1846 40
D 0.0420 0.0822 0.0199 0.0623 -0.0202 40
U -0.0135 -0.0135 0.0226 -0.0361 0.0226 40
Q 0.0555 0.0957 -0.0026 0.0983 -0.0428 40
B 0.2238 0.2126 0.2353 -0.0227 0.2465 40
g 0.5345 0.7332 0.3725 0.3607 0.1738 40
gp 0.1212 0.1577 0.0859 0.0718 0.0493 40
A negative \(R^2\) is a mathematical impossibility. The optimism here is quite high in both the Dxy and R2.
glm.model.1 <- glm(Accepted ~ Phys.GPA + Ugrad.GPA + NBME.Percentile +
MCAT.Percentile + Race.F2 + Sex.F, data = Leo,
family="binomial"(link="logit"))
par(mfrow=c(1,2))
plot(glm.model.1, which = c(4,5))These plots did not help us identify any outliers. We only checked because some of the points in the prediction plots do not make sense.
4.2 Complete: NL
plot(spearman2(Accepted ~ Phys.GPA + Ugrad.GPA + NBME.Percentile +
MCAT.Percentile + Race.F2 + Sex.F, data = Leo))Here, we plot a Spearman rho2 plot to see if maybe there are some non-linear terms we can add. Fit a cubic spline to the Phys.GPA. Include interaction terms between Race and NBME.Percentile. Orginally we wanted to include interaction terms between Race and Phys.GPA, however the lrm() fit yielded singular values.
dd <- datadist(Leo)
options(datadist = "dd")
lrm.model.nl <- lrm(Accepted ~ rcs(Phys.GPA, 5) + Ugrad.GPA + NBME.Percentile*Race.F2 +
MCAT.Percentile + Sex.F, data = Leo,
x = TRUE, y = TRUE)
lrm.model.nlLogistic Regression Model
lrm(formula = Accepted ~ rcs(Phys.GPA, 5) + Ugrad.GPA + NBME.Percentile *
Race.F2 + MCAT.Percentile + Sex.F, data = Leo, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 148 LR chi2 16.47 R2 0.143 C 0.692
0 56 d.f. 10 g 0.836 Dxy 0.384
1 92 Pr(> chi2) 0.0870 gr 2.308 gamma 0.384
max |deriv| 8e-09 gp 0.181 tau-a 0.182
Brier 0.210
Coef S.E. Wald Z Pr(>|Z|)
Intercept -10.0653 10.1622 -0.99 0.3220
Phys.GPA 0.1219 0.1229 0.99 0.3214
Phys.GPA' 0.2120 0.5476 0.39 0.6986
Phys.GPA'' -1.5845 5.0256 -0.32 0.7525
Phys.GPA''' 0.4855 9.8094 0.05 0.9605
Ugrad.GPA 0.0590 0.6311 0.09 0.9255
NBME.Percentile -0.0200 0.0111 -1.79 0.0727
Race.F2=URM 0.3980 0.9414 0.42 0.6725
MCAT.Percentile 0.0106 0.0090 1.17 0.2400
Sex.F=M -0.1050 0.3724 -0.28 0.7780
NBME.Percentile * Race.F2=URM -0.0161 0.0202 -0.80 0.4253
This non-linear model has a Nagelkerke R2 of 0.143, far from 1. The C statistic is 0.692, an improvement (compared to 0.633), but still suggests poor predictive ability by this measure. The Brier score is 0.210, where 0 is optimal. This model is a slight improvement with the risk of overfitting.
lrm.model.nl.coef <- tidy(summary(lrm.model.nl))
# Extract Even Rows and Confidence Intervals
lrm.model.nl.tidy <- lrm.model.nl.coef[c(FALSE, TRUE),
c("Effect", "Lower.0.95", "Upper.0.95")]
# set rownames
invisible((setattr(lrm.model.nl.tidy, "row.names",
c("Phys.GPA", "Ugrad.GPA", "NBME.Percentile", "MCAT.Percentile",
"Race.F2 - URM:ORM", "Sex.F - F:M"))))
pander(lrm.model.nl.tidy, caption = "Odds Ratios: Non-linear Complete Case, NBME = 47, Race = ORM")| Â | Effect | Lower.0.95 | Upper.0.95 |
|---|---|---|---|
| Phys.GPA | 3.325 | 1.002 | 11.03 |
| Ugrad.GPA | 1.024 | 0.628 | 1.668 |
| NBME.Percentile | 0.3927 | 0.1415 | 1.09 |
| MCAT.Percentile | 1.477 | 0.7707 | 2.829 |
| Race.F2 - URM:ORM | 0.6995 | 0.2426 | 2.017 |
| Sex.F - F:M | 1.111 | 0.5353 | 2.304 |
Summary of the effects of each predictor, showing the effect on ‘Accepted’ when moving from the 25th to the 75th percentile of each variable while holding the other variables at a prespecified level. Increases in ‘Phys.GPA’ increases the likelihood of acceptance (OR: 3.325; 95% CI: 1.002, 11.03). ‘Ugrad.GPA’, ‘MCAT.Percentile’, ‘Race.R2’, and ‘Sex’ all had effect sizes that crossed 1.
4.2.1 Calibration
plot(calibrate(lrm.model.nl))
n=148 Mean absolute error=0.054 Mean squared error=0.00371
0.9 Quantile of absolute error=0.082
The calibration plot for this non-linear model also under-predicts for probabilities under 0.6 and over-predicts at probabilities above 0.6. It, however, yields a mean absolute error of 0.054 (vs. 0.033 in the Kitchen Sink model) and a mean squared error of 0.00371 (vs. 0.00177 in the Kitchen Sink model).
4.2.2 ROC
4.2.3 Nomogram
plot(nomogram(lrm.model.nl, fun=plogis, funlabel = "Probability of Acceptance"))While the model was slightly improved by adding a non-linear term and an interaction term, the nomogram tells an interesting story. Undergraduate GPA, MCAT percentile, and Sex remain in similar positions as in the Kitchen Sink model. Better performance in the core physiology courses increases probability of acceptance, to an extent; >90% performance becomes deleterious. NBME.Percentile has a much larger effect for underrepresented minorities, with the odd effect that a better NBME percentile worsens the probability of acceptance. This is the opposite effect we expected.
4.2.4 Plot Predictions
plot.nl.nbme <- ggplot(Predict(lrm.model.nl, NBME.Percentile = 0:100, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = NBME.Percentile, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "NBME Percentile",
y = "Pr(Acceptance)",
title = "Model 2 - Nonlinear Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
plot.nl.gpa <- ggplot(Predict(lrm.model.nl, Phys.GPA = 60:105, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = Phys.GPA, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "Weighted Physiology Average",
y = "Pr(Acceptance)",
title = "Model 2 - Nonlinear Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
plot.nl.mcat <- ggplot(Predict(lrm.model.nl, MCAT.Percentile = 0:100, Race.F2, Sex.F, fun=plogis)) +
geom_point(aes(x = MCAT.Percentile, y = Accepted), data = Leo) +
theme_bw() +
labs(x = "MCAT Percentile",
y = "Pr(Acceptance)",
#title = "Model 1 - Kitchen Sink Predictions",
subtitle = "Across Race and Sex, holding all other predictors at their medians")
grid.arrange(plot.nl.nbme, plot.nl.gpa, plot.nl.mcat, nrow = 3)Holding Medical Physiology grades at 87.81, undergraduate GPA at 3.22, and MCAT percentile at 66.5, probability of acceptance again declines with NBME performance. Probability of acceptance declines more precipitously within the underrepresented minorities group. This could be because the lack of data we have for underrepresented students.
Holding undergraduate GPA at 3.22, NBME percentile at 47, and MCAT percentile at 66.5, the probability of acceptance increases with Medical Physiology grade, again, to an extent. This may have to do with some limitations of our data set, which we will address later.
glm.model.nl <- glm(Accepted ~ rcs(Phys.GPA, 5) + Ugrad.GPA + NBME.Percentile*Race.F2 +
MCAT.Percentile + Sex.F, data = Leo,
family="binomial"(link="logit"))
par(mfrow=c(1,2))
plot(glm.model.nl, which = c(4,5))These plots did not help us identify any outliers. We only checked because some of the points in the prediction plots do not make sense.
4.2.5 Validate
set.seed(314); validate(lrm.model.nl) index.orig training test optimism index.corrected n
Dxy 0.3839 0.4661 0.2908 0.1753 0.2086 40
R2 0.1433 0.2152 0.0843 0.1309 0.0125 40
Intercept 0.0000 0.0000 0.2041 -0.2041 0.2041 40
Slope 1.0000 1.0000 0.5607 0.4393 0.5607 40
Emax 0.0000 0.0000 0.1671 0.1671 0.1671 40
D 0.1045 0.1657 0.0575 0.1082 -0.0037 40
U -0.0135 -0.0135 0.0441 -0.0576 0.0441 40
Q 0.1180 0.1792 0.0134 0.1658 -0.0478 40
B 0.2100 0.1952 0.2286 -0.0334 0.2434 40
g 0.8363 1.0893 0.5972 0.4921 0.3442 40
gp 0.1810 0.2174 0.1336 0.0838 0.0972 40
4.3 Imputation
# Check patterns in missingness
marginplot(Trish[, c("MCAT.Percentile", "Ugrad.GPA")])We are missing 30 counts of Undergraduate GPA and 85 counts of MCAT Percentiles. There are 19 students with both missing. Undergrad GPA is probably missing at random. MCAT Percentile is probably not.
aggr(Trish,
numbers = TRUE, sortVARS = TRUE,
labels = names(Trish), cex.axis = 0.5,
gap = 3, ylab = c("Missing Data", "Pattern"))# MCAT Percentile is missing > 30% of data
# Ugrad gpa/Sex is only missing in arond 10%
# Missing grades for the 2016 cohort
# Missing Acceptances for 2016 cohortUgrad.GPA, MCAT.Percentile, Race.F, Sex.F are all missing at least 10% of the data. MCAT.Percentile is missing about 30%.
# Non Missing Outcomes
# Initialize Imputation Stuff
imp.t <- mice(Trish, max=0, print=FALSE)
# Predictor Matrix
pred <- imp.t$predictorMatrix
# record ID & Program Choice aren't used as predictors
pred[, c("record.ID", "Sex")] <- 0
# Run with 50 frames and 10 iterations
imp.t <- mice(Trish, pred = pred,
m = 4, maxit = 10, seed = 271828, print = F)
# Combine all of the imputation frames in a long dataset
imp.t.complete <- mice::complete(imp.t, 'long')
imp.single.3 <- as.data.frame(mice::complete(imp.t, 3)) # save frame 3 as a dataframe# Non missing Outcomes
# Density plots for Imputed Values
densityplot(imp.t)The density histograms were plotted for each imputed frame (pink) against the observed values (blue). While the ugrad GPA has a similar distribtuion to the observed values, the MCAT percentiles were more symmetric than was observed in the exisiting data. The imputed frames give more weight to the lower end of the percentile range. This is annoying.
4.3.1 Pooled Analyses
# Non Linear
dd <- datadist(imp.single.3)
options(datadist = "dd")
lrm.model.nl.imp <- with(imp.t, lrm(Accepted ~ rcs(Phys.GPA, 5) + Ugrad.GPA + NBME.Percentile*Race.F2 +
MCAT.Percentile + Sex.F, data = Leo,
x = TRUE, y = TRUE))
nl.imp.t.pool <- pool(lrm.model.nl.imp)
nl.imp.t.results <- summary(pool(lrm.model.nl.imp))
pander(round(nl.imp.t.results, 2), caption = "Non-Exponentiated Imputed Pooled Results")| Â | est | se | t | df | Pr(>|t|) | lo 95 | hi 95 |
|---|---|---|---|---|---|---|---|
| Intercept | -10.07 | 10.16 | -0.99 | 996575 | 0.32 | -29.98 | 9.85 |
| Phys.GPA | 0.12 | 0.12 | 0.99 | 996575 | 0.32 | -0.12 | 0.36 |
| Phys.GPA’ | 0.21 | 0.55 | 0.39 | 996575 | 0.7 | -0.86 | 1.29 |
| Phys.GPA’’ | -1.58 | 5.03 | -0.32 | 996575 | 0.75 | -11.43 | 8.27 |
| Phys.GPA’’’ | 0.49 | 9.81 | 0.05 | 996575 | 0.96 | -18.74 | 19.71 |
| Ugrad.GPA | 0.06 | 0.63 | 0.09 | 996575 | 0.93 | -1.18 | 1.3 |
| NBME.Percentile | -0.02 | 0.01 | -1.79 | 996575 | 0.07 | -0.04 | 0 |
| Race.F2=URM | 0.4 | 0.94 | 0.42 | 996575 | 0.67 | -1.45 | 2.24 |
| MCAT.Percentile | 0.01 | 0.01 | 1.17 | 996575 | 0.24 | -0.01 | 0.03 |
| Sex.F=M | -0.1 | 0.37 | -0.28 | 996575 | 0.78 | -0.83 | 0.62 |
| **NBME.Percentile * Race.F2=URM** | -0.02 | 0.02 | -0.8 | 996575 | 0.43 | -0.06 | 0.02 |
| Â | nmis | fmi | lambda |
|---|---|---|---|
| Intercept | NA | 0 | 0 |
| Phys.GPA | 0 | 0 | 0 |
| Phys.GPA’ | NA | 0 | 0 |
| Phys.GPA’’ | NA | 0 | 0 |
| Phys.GPA’’’ | NA | 0 | 0 |
| Ugrad.GPA | 30 | 0 | 0 |
| NBME.Percentile | 0 | 0 | 0 |
| Race.F2=URM | NA | 0 | 0 |
| MCAT.Percentile | 85 | 0 | 0 |
| Sex.F=M | NA | 0 | 0 |
| **NBME.Percentile * Race.F2=URM** | NA | 0 | 0 |
# Exponentiate the Pooled results to get odds ratios
pander(round(exp(nl.imp.t.results[, c("est", "lo 95", "hi 95")]), 2),
caption = "Odds Ratios: Non-Linear, Multiply Imputed Predictors")| Â | est | lo 95 | hi 95 |
|---|---|---|---|
| Intercept | 0 | 0 | 19003 |
| Phys.GPA | 1.13 | 0.89 | 1.44 |
| Phys.GPA’ | 1.24 | 0.42 | 3.62 |
| Phys.GPA’’ | 0.21 | 0 | 3887 |
| Phys.GPA’’’ | 1.62 | 0 | 363584801 |
| Ugrad.GPA | 1.06 | 0.31 | 3.65 |
| NBME.Percentile | 0.98 | 0.96 | 1 |
| Race.F2=URM | 1.49 | 0.24 | 9.42 |
| MCAT.Percentile | 1.01 | 0.99 | 1.03 |
| Sex.F=M | 0.9 | 0.43 | 1.87 |
| **NBME.Percentile * Race.F2=URM** | 0.98 | 0.95 | 1.02 |
Nothing is significant here. Note the extremely high Upper 95% CI. The pooling did not work well here.
par(mfrow = c(2,2))
invisible(with(imp.t, plot(calibrate(lrm.model.nl, fun=plogis, funlabel = "Probability of Acceptance"))))
n=148 Mean absolute error=0.047 Mean squared error=0.00321
0.9 Quantile of absolute error=0.085
n=148 Mean absolute error=0.051 Mean squared error=0.00348
0.9 Quantile of absolute error=0.085
n=148 Mean absolute error=0.046 Mean squared error=0.00305
0.9 Quantile of absolute error=0.08
n=148 Mean absolute error=0.042 Mean squared error=0.00278
0.9 Quantile of absolute error=0.081
par(mfrow = c(1,1))Calibration plots look really similar to the non-linear complete case.
4.4 Validate Using an Imputed Frame
# Kitchen sink using new data
e <- datadist(imp.single.3)
options(datadist = "e")
#ks.imp.3 <- lrm(Accepted ~ Phys.GPA + Ugrad.GPA + NBME.Percentile +
#MCAT.Percentile + Race.2 + Sex, data = imp.single.3,
#x = TRUE, y = TRUE)
pred.ks.imp.3 <- predict(lrm.model.1, newdata = data.frame(imp.single.3), type = "fitted")
pred.nl.imp.3 <- predict(lrm.model.nl, newdata = data.frame(imp.single.3), type = "fitted")# Compare Models 1 and 2 based on Prediction Error Summaries
errors_ks <- imp.single.3$Accepted - pred.ks.imp.3
errors_nl <- imp.single.3$Accepted - pred.nl.imp.3
mape_ks <- mean(abs(errors_ks))
mape_nl <- mean(abs(errors_nl))
mspe_ks <- mean(errors_ks^2)
mspe_nl <- mean(errors_nl^2)
cor_ks <- cor(imp.single.3$Accepted, pred.ks.imp.3)
cor_nl <- cor(imp.single.3$Accepted, pred.nl.imp.3)
Cstat_ks <- rcorr.cens(pred.ks.imp.3, imp.single.3$Accepted)[1]
Cstat_nl <- rcorr.cens(pred.nl.imp.3, imp.single.3$Accepted)[1]
validation.table <- data_frame(
Model = c("KS Complete", "NL Complete"),
MAPE = round(c(mape_ks, mape_nl),3),
MSPE = round(c(mspe_ks, mspe_nl),3),
'Corr with Admit' = round(c(cor_ks, cor_nl),3),
'Est. C' = round(c(Cstat_ks, Cstat_nl),3))
pander(validation.table,
caption = "Validating on an Imputed Frame")| Model | MAPE | MSPE | Corr with Admit | Est. C |
|---|---|---|---|---|
| KS Complete | 0.460 | 0.232 | 0.225 | 0.640 |
| NL Complete | 0.432 | 0.216 | 0.341 | 0.701 |
5 Comparison: Complete Cases
| Model | Dxy | C | R2 | Brier | AIC | BIC |
|---|---|---|---|---|---|---|
| KS Complete | 0.266 | 0.633 | 0.065 | 0.224 | 203 | 224 |
| NL Complete | 0.384 | 0.692 | 0.143 | 0.210 | 202 | 235 |
The R2 for both the Kitchen Sink and Nonlinear models are pretty poor, although the nonlinear model’s R2 is slightly better. The AIC may as well be identical for the two models, while the Kitchen Sink has a better BIC. Using the table alone, the nonlinear is a more attractive option. However, we do have some concerns with the nonlinear model, referenced above, that steers us towards the kitchen sink model instead.
6 Conclusions
While higher MCAT Percentiles were associated with increased probabilities of acceptance, the effect size was not as large as we expected. In our dataset, MCAT was not significant. This is likely reflective of the fact that we only had one MCAT score per student; it is likely that students scoring below the 70th percentile retook the exam before ultimately gaining acceptance to medical school.s
The average scores in the core physiology courses were predictive of acceptance. Via the kitchen sink model, students with better performance were more likely to gain acceptance. The non-linear model also predicted this up to around an average of 90%, where the likelihood of acceptance plummets.
Given the findings of our models, there are some real issues with this dataset:
- For students not gaining admission to school, there was no indication as to whether or not they applied.
- Only the first MCAT score that the student applied with to the graduate program was available. We have no way of knowing if he/she retook the exam unless they answered the exit survey.
- Interview performance is also a very important factor in admissions, and this was completely left out of this analysis.
- We learned:
- Minimize free form boxes on surveys. People think differently and like to provide a mixture of text and numbers. People also spell their names differently from time to time.
- We really like the
full_joinfunction. Also this%>%is very convenient for tidying tibbles. - Model validation with multiple imputations is non-trivial.
Opportunities for Further Investigation:
- Get a better dataset
# Obtain R version/Platform/OS
sessionInfo()R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dplyr_0.5.0 purrr_0.2.2 readr_1.0.0
[4] tidyr_0.6.1 tibble_1.2 tidyverse_1.1.1
[7] searchable_0.3.3.1 broom_0.4.2 VIM_4.6.0
[10] data.table_1.9.6 colorspace_1.3-2 mitools_2.3
[13] forcats_0.2.0 rms_5.1-0 SparseM_1.76
[16] Hmisc_4.0-2 ggplot2_2.2.1 Formula_1.2-1
[19] survival_2.40-1 lattice_0.20-34 miceadds_2.4-12
[22] mice_2.30 readxl_0.1.1 ROCR_1.0-7
[25] gplots_3.0.1 pander_0.6.0 gridExtra_2.2.1
[28] tableone_0.7.3 leaps_3.0 arm_1.9-3
[31] lme4_1.1-12 Matrix_1.2-8 MASS_7.3-45
[34] rmdformats_0.3.2 knitr_1.15.1
loaded via a namespace (and not attached):
[1] backports_1.0.5 blme_1.0-4 sm_2.2-5.4
[4] plyr_1.8.4 igraph_1.0.1 grouped_0.6-0
[7] GPArotation_2014.11-1 lazyeval_0.2.0 sp_1.2-4
[10] splines_3.3.3 polycor_0.7-9 TH.data_1.0-8
[13] digest_0.6.12 htmltools_0.3.5 gdata_2.17.0
[16] magrittr_1.5 checkmate_1.8.2 sirt_1.15-41
[19] cluster_2.0.5 sfsmisc_1.1-0 modelr_0.1.0
[22] MCMCpack_1.3-9 sandwich_2.3-4 lpSolve_5.6.13
[25] rvest_0.3.2 haven_1.0.0 jsonlite_1.3
[28] zoo_1.7-14 ape_4.1 multiwayvcov_1.2.3
[31] gtable_0.2.0 MatrixModels_0.4-1 sjmisc_2.3.1
[34] questionr_0.5 mirt_1.23 car_2.1-4
[37] kernlab_0.9-25 DEoptimR_1.0-8 abind_1.4-5
[40] scales_0.4.1 mvtnorm_1.0-6 DBI_0.6
[43] miniUI_0.1.1 Rcpp_0.12.9 laeken_0.4.6
[46] xtable_1.8-2 htmlTable_1.9 foreign_0.8-67
[49] stats4_3.3.3 survey_3.31-5 vcd_1.4-3
[52] httr_1.2.1 htmlwidgets_0.8 RColorBrewer_1.1-2
[55] lavaan_0.5-23.1097 acepack_1.4.1 nnet_7.3-12
[58] labeling_0.3 reshape2_1.4.2 munsell_0.4.3
[61] tools_3.3.3 evaluate_0.10 stringr_1.2.0
[64] yaml_2.1.14 mcmc_0.9-4 robustbase_0.92-7
[67] ic.infer_1.1-5 caTools_1.17.1 nlme_3.1-131
[70] mime_0.5 quantreg_5.29 lavaan.survey_1.1.3.1
[73] xml2_1.1.1 compiler_3.3.3 pbkrtest_0.4-6
[76] rstudioapi_0.6 e1071_1.6-8 kappalab_0.4-7
[79] pbivnorm_0.6.0 stringi_1.1.2 highr_0.6
[82] cubature_1.3-6 TAM_2.0-37 psych_1.6.12
[85] nloptr_1.0.4 tensorA_0.36 stringdist_0.9.4.4
[88] CDM_5.5-21 lmtest_0.9-35 combinat_0.0-8
[91] bitops_1.0-6 corpcor_1.6.9 httpuv_1.3.3
[94] MCMCglmm_2.24 R6_2.2.0 latticeExtra_0.6-28
[97] bookdown_0.3 KernSmooth_2.23-15 codetools_0.2-15
[100] polspline_1.1.12 boot_1.3-18 gtools_3.5.0
[103] assertthat_0.1 chron_2.3-50 rprojroot_1.2
[106] mnormt_1.5-5 multcomp_1.4-6 hms_0.3
[109] WrightMap_1.2.1 mgcv_1.8-17 parallel_3.3.3
[112] quadprog_1.5-5 rpart_4.1-10 class_7.3-14
[115] coda_0.19-1 minqa_1.2.4 rmarkdown_1.3
[118] lubridate_1.6.0 shiny_1.0.0 base64enc_0.1-3